UKP at CrossLink2: CJK-to-English Subtasks

نویسندگان

  • Jungi Kim
  • Iryna Gurevych
چکیده

This paper describes UKP’s participation in the cross-lingual link discovery task at NTCIR-10 (CrossLink2). The task addressed in our work is to find valid anchor texts from a Chinese, Japanese, and Korean (CJK) Wikipedia page and retrieve the corresponding target Wiki pages in the English language. The CrossLink framework was developed based on our previous CrossLink system that works on the opposite directions of the language pairs, i.e. discovered anchor texts from English Wikipedia pages and their corresponding targets in CJK languages. The framework consists of anchor selection, anchor ranking, anchor translation, and target discovery sub-modules. Each sub-module in the framework has been shown to work well both in monolingual settings and English to CJK language pairs. We seek to find out whether the approach that worked very well for English to CJK would still work for CJK to English. We use the same experimental settings that were used in our previous participation, and our experimental runs show that the CJK-toEnglish CrossLink task is a much harder task when using the same resources as the English-to-CJK one.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery

This paper describes UKP’s participation in the cross-lingual link discovery (CLLD) task at NTCIR-9. The given task is to find valid anchor texts from a new English Wikipedia page and retrieve the corresponding target Wiki pages in Chinese, Japanese, and Korean languages. We have developed a CLLD framework consisting of anchor selection, anchor ranking, anchor translation, and target discovery ...

متن کامل

DCU at NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) Task

DCU participated in the English to Chinese (C2E) and Chinese to English (C2E) subtasks of the NTCIR 10 CrossLink2 Cross-lingual Link Discovery (CLLD) task. Our strategy for each query involved extracting potential link anchors as n-gram strings, cleaning of potential anchor strings, and anchor expansion and ranking to select a set of anchors for the query. Potential anchors were translated usin...

متن کامل

CJK Experiments with Hummingbird SearchServerTM at NTCIR-5

Hummingbird submitted ranked result sets for the Chinese, Japanese and Korean Single Language Information Retrieval subtasks of the Cross-Lingual Information Retrieval Task of the 5th NII-NACSIS Test Collection for IR Systems Workshop (NTCIR-5). For short Chinese (title) queries, a decompounded wordbased approach produced higher (statistically significant) mean average precision and first relev...

متن کامل

NTCIR-10 CrossLink-2 Task: A Link Mining Strategy

At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the crosslingual linking method achieved promising results.

متن کامل

Automated Cross-lingual Link Discovery in Wikipedia

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013